Our project sought to compare the quality of life in 26 major U.S. cities using data from the U.S. Centers for Disease Control and Prevention. The data set includes variables that pertain to demographics and indicators of health, income, and education level.
We intended to do this by producing interactive plots, making linear models of key predictors, and comparing cities based on their “Livability”, a variable we created representing relative levels of positive and negative predictors of quality of life.
Our hypothesis was that the “Livability” of Detroit, MI would be lowest because it is known for its weak economy, high levels of poverty, and high crime rates. Meanwhile, we predicted Boston, MA and Portland, OR would be relatively high on the list as they are known to be cities with thriving economies and rich local cultures.
The main dataset we used was the U.S. Centers for Disease Control and Prevention’s Big Cities Health Inventory Data. The R language and RStudio were used for all data wrangling, analyses, and visualizations. We researched previous work done to summarize the health, finances, and education levels of Americans. The Social Science Research Council has a comprehensive interactive map of the U.S. depicting a range of statistics regarding human wellbeing by state. Their map informed our approach to analysis and visualisation. Inspired by this resource, we decided presenting data on life expectancy, income, and education level by various demographics was crucial to summarizing quality of life in big cities.
First, our process involved finding a mean of the values for each city and indicator. Each city has multiple years of data for each indicator so obtaining a mean across these years is important for further analysis. We then calculated the sum of mean values for a given indicator across all cities. The city’s mean for each indicator was then divided by the sum of means in the indicator, we’ll call this value the city’s relative value. Since we have a relative value for every city in each indicator we can calculate totals of the relative value for the good and bad indicator categories. This is done by summing the relative values for the good indicators and for the bad indicators. Then to get a final ranking of the cities we subtracted the sum of the good indicators from the sum of the bad indicators for each city. The result decides the final ranking of the city. Finally, we plotted Livability by City.
After this, we made linear models to see if life expectancy and race and life expectancy and city had statistically significant associations.
Subsequently, we produced an interactive faceted barplot depicting life expectancy by city and race. Then, we made a series of static barplots showing median household income, all-cause mortality rate, and percent of adults who meet CDC recommended levels of physical activity by city in order to visualize the relative health and income levels of the cities. Finally, we produced three interesting barplots depicting firearms-related mortality rate, homicide rate, and suicide rate. It was surprising to find that Sacremento, CA has the highest suicide rate. It was unsurprising to find that Detroit, MI had both the highest homicide rate and the highest firearms-related mortality rate.
Figure 1: Scatterplot of livability by city
Figure 2: Linear model of life expectancy and city
##
## Call:
## lm(formula = Mean_Life_Exp_by_Race_and_Place ~ Place, data = life_exp.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.225 -3.031 1.000 3.775 9.125
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 82.5250 2.6428 31.227 <2e-16 ***
## PlaceCleveland, OH -3.2500 3.7375 -0.870 0.390
## PlaceFort Worth (Tarrant County), TX -7.1250 4.5774 -1.557 0.128
## PlaceKansas City, MO -6.3750 4.5774 -1.393 0.172
## PlaceLas Vegas (Clark County), NV -0.9500 3.7375 -0.254 0.801
## PlaceLong Beach, CA -1.5000 3.4118 -0.440 0.663
## PlaceLos Angeles, CA -1.0250 3.7375 -0.274 0.785
## PlaceNew York, NY -2.2917 4.0369 -0.568 0.574
## PlaceOakland, CA -2.6917 4.0369 -0.667 0.509
## PlacePortland (Multnomah County), OR -2.2050 3.5457 -0.622 0.538
## PlaceSan Antonio, TX -4.1583 4.0369 -1.030 0.309
## PlaceSan Diego County, CA 0.1667 3.7375 0.045 0.965
## PlaceSan Francisco, CA -3.1000 3.7375 -0.829 0.412
## PlaceSeattle, WA 1.1083 4.0369 0.275 0.785
## PlaceWashington, DC -0.7583 4.0369 -0.188 0.852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.286 on 39 degrees of freedom
## Multiple R-squared: 0.1523, Adjusted R-squared: -0.152
## F-statistic: 0.5006 on 14 and 39 DF, p-value: 0.9185
The results show that life expectancy is not correlated with city in a statistically significant way.
Figure 3: Linear model for mean life expectancy and race
##
## Call:
## lm(formula = Mean_Life_Exp_by_Race_and_Place ~ Race..Ethnicity,
## data = life_exp.df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4956 -1.2864 0.0022 1.2381 6.8410
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 76.400 2.535 30.139 < 2e-16 ***
## Race..EthnicityAsian/PI 10.071 2.689 3.746 0.000491 ***
## Race..EthnicityBlack -1.323 2.618 -0.505 0.615595
## Race..EthnicityHispanic 8.259 2.631 3.140 0.002923 **
## Race..EthnicityMultiracial 3.100 3.585 0.865 0.391577
## Race..EthnicityNative American 1.100 3.585 0.307 0.760320
## Race..EthnicityWhite 3.496 2.618 1.335 0.188251
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.535 on 47 degrees of freedom
## Multiple R-squared: 0.765, Adjusted R-squared: 0.735
## F-statistic: 25.51 on 6 and 47 DF, p-value: 3.101e-13
The linear model shows that Asian and Hispanic races or ethnicities are have a statistically significant association with life expectancy. The resulting model is: life expectancy = 76.4 + 10.071 x Asian + 8.259 x Hispanic
Figure 4: Interactive faceted barplot of mean life expectancy by race and city
Figure 5: Barplot median household income by city
San Jose, CA has the highest Median Household Income, while Detroit, MI has the lowest.
Figure 6: Barplot of Mortality Rate by city and race
Cleveland, OH has the lowest All-Cause Mortality Rate, while Phoenix, AZ has the lowest.
Figure 7: Barplot showing percentage of adults who meet physical activity recommendations by city
Denver, CO has the highest percentage of adults who meet CDC-recommend physical activity levels, while Detroit, MI has the lowest.
Figure 8: Scatterplot showing firearm related mortality rate by city
Detroit, MI has the highest firearm-related mortality rate, while New York, NY has the lowest.
Figure 9: Barplot of homicide rate by city
Detroit, MI has the highest homicide rate, while San Diego, CA has the lowest
Figure 10: Barplot of suicide rate by city
Sacramento, CA has the highest suicide rate, while Los Angeles, CA has the lowest.
In conclusion, we found the city with lowest quality of health, education, and income was Detroit, Michigan and the city with the best of these qualities was San Jose, California. It is not surprising that San Jose is ranked highest in many measure of quality of life as it is the center of Silicon Valley, the home of many large technology companies.
One of the biggest challenges in this project was finding a way to correctly quantify quality of life in a city. To get a reasonable result we had to try out a few different iterations of the final method we came to and we feel that this method is the most accurate way to rank these cities based on the data we had.
Detroit, MI matched our hypothesis, while Boston, MA and Portland, OR differed from it. One reason Boston, MA and Portland, OR were lower on the list than hypothesized was because the data set incorporated a limited number of predictors that do not fully encompass quality of life. Also, we did not weigh the variables differently in our calculation of “Livability”, when in reality some factors are more important than others.
References
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing,Vienna, Austria. URL https://www.R-project.org/.
RStudio Team (2015). RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL http://www.rstudio.com/.
Big Cities Health Data: https://data.world/health/big-cities-health
Measure of America Interactive Map: https://measureofamerica.org/maps/?state^hdi^all_all^HDI^hdi
Measure of America Data Set: http://measureofamerica.org/download-agreement/